Skip to content

Conversation

@elenash
Copy link
Contributor

@elenash elenash commented Mar 16, 2015

...file

This commit introduces new mca parameter -mca mca_base_envar_file_prefix and command line option -tune to specify a single file or list of them separated by "," to set mca parameters and environment variables with the following syntax:

  • every line can hold zero or many "-x" or "-mca" args with single or double dash
  • support patterns: -mca var val -mca var "val" -x var=val -x var (mca params can be in quotes, env vars only without quotes)
  • ignore spaces (no matter how many extra spaces or tabs are in the line)
  • support old (var=val) and new syntax in different lines
  • if arg is duplicated, its latest value is stored.

Examples:

  1. cat app.conf
    -x b=2 -x a=3

cmd: -x a=1 -tune app.conf -> a=1 b=2

  1. cat app.conf
    -mca btl ^tcp

cmd: -tune app.conf -> -mca btl ^tcp

  1. cat app.conf
    --mca btl ^tcp

cmd: --mca btl tcp,self -tune app.conf -> --mca btl tcp,self

  1. cat app.conf
    -mca btl ^tcp

cat mca.conf
btl=^openib

cmd: -mca btl tcp,self -tune app.conf -am mca.conf -> -mca btl tcp,self

  1. cat app.conf
    -mca btl ^tcp

cat mca.conf
btl=^openib

cmd: -tune app.conf -am mca.conf -> -mca btl ^tcp

  1. export e=5
    cat app.conf
    -x a=1 -x b=2 -x c=3

cat app2.conf
-x d=4 -x e -x c=8
cmd: –tune app1.conf,app2.conf -> a=1 b=2 c=8 d=4 e=5

A conf file can be specified with absolute path, relative path or just name, there exist mca variables to specify path to look in as for -am option.

This feature works properly only when job is launched under mpirun, direct launch is not supported.

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/348/
Test PASSed.

@rhc54
Copy link
Contributor

rhc54 commented Mar 16, 2015

I'm not sure I fully understand this proposed change. We already have the aggregate MCA file with option --amca, and we have the MCA param for passing envars. So isn't this just renaming -amca to -tuned?

@elenash
Copy link
Contributor Author

elenash commented Mar 16, 2015

This is not just renaming, this new option supports setting not only mca parameters from the file but env variables as well. This option simplify the procedure of writing a profile with the best options(mca/env) for specific application. Users can just copy from the command line and paste to the conf file requested mca and env parameters. This is what Mike just mentioned in the thread related to direct launch.

@mike-dubman
Copy link
Member

the proposed format is more user friendly than existing one:

  • it contains mpirun cmd line args that can be copy&pasted from shell and placed into file w/o modifications (in amca set one need to convert
-mca var val 

as

var=val

it is very hard to explain to end-user, no way to the end-user complete the conversion process w/o errors and no way to copy&paste from mpirun command line as-is into recipe file.

  • In the existing format, passing env variables is not an easy task, example:

It is very easy to make mistakes, hard to support and hard to maintain

%cat hcoll_amca.conf
mca_base_env_list = HCOLL_BCOL=basesmuma,ptpcoll;HCOLL_SBGP=basesmuma,p2p;HCOLL_ML_USE_KNOMIAL_ALLREDUCE=1
%
  • so, the format provides very easy way to specify MCA and ENV vars in the file. The file can represent benchmark or application. It can be easy copy&paste from mpirun command line and put into recipe file.

Example:

% cat imb.conf
-x MXM_TLS=rc,self,shm -x HCOLL_ALG=bruck 
-mca opal_rmaps_policy dist:mlx5_1:span
-mca fca_enable_caching 1
% mpirun -tune imb.conf imb.exe
  • Many major MPI distributions have similar capabilities which are very user-friendly and widely used.

@rhc54
Copy link
Contributor

rhc54 commented Mar 16, 2015

So what you are proposing is to create a new variation of the current -amca file that has a different syntax, but does essentially the same thing? Then to distinguish it, you would create a new cmd line option to orterun so we know which file type to expect?

I'm just trying to grok what you are proposing here. Since we wouldn't backport something like this to the 1.8 series, one question that springs to mind is: why not just modify the amca parser to handle this new syntax?

@mike-dubman
Copy link
Member

you are right. The amca existing parser was modified to support a new syntax and all amca infra was reused.

we added "-tune" option to keep "-amca" backward compatibility and not mess with existing concepts (amca accepts colon as file list separator, tune accepts comma)

@rhc54
Copy link
Contributor

rhc54 commented Mar 16, 2015

Thanks - I now grok your intent! I'll take a closer look.

@mike-dubman
Copy link
Member

@rhc54 - could you please review? thanks

@rhc54
Copy link
Contributor

rhc54 commented Mar 19, 2015

Jeff and others that were occupied this week asked for a chance to consider it, will discuss at next week's telecon. I think we're leaning towards just replacing the current amca parser with this one so we only have one such method.

@jsquyres
Copy link
Member

We talked about this today on the call.

  1. This generally seems like a good idea.
  2. Does the file format support comments? E.g., "# this line is ignored". It seems like that would be a pretty important feature.
  3. Everyone on the call felt ok about moving forward to this new format as the "preferred" format, and therefore --tuned should be preferred over --amca. So it would also be a good idea to put some kind of "Hey, I see you used --amca. Be aware that this option is deprecated; you should use --tuned instead. The --amca option is likely to disappear in a future release..." kind of warning message.
  4. Can the mpirun man page be updated to reflect all this stuff?

@elenash
Copy link
Contributor Author

elenash commented Mar 25, 2015

Nice! As for comments, all kind of them are still supported (#, //, /**/), I just added new patterns to the existing parser so it must be backwards compatible.
BTW, -amca option will be working the same as -tune after this change so if a conf file specified by -amca option has lines in the new format (-mca ... -x ...), they will be parsed and handled as for -tune option.
Sure, I'll update mpirun man page

@jsquyres
Copy link
Member

@elenash Great -- thank you! Can you put some kind of deprecated notice on the --amca switch? The issue is that the name "AMCA" implies that it only does MCA params (not all mpirun CLI params). So we might as well deprecate it now, and eventually get rid of it.

@elenash
Copy link
Contributor Author

elenash commented Mar 25, 2015

@jsquyres Sure. In which release amca should be deprecated?

@jsquyres
Copy link
Member

v1.9. We'll kill -amca in v2.1.

(i.e., we have to let it be deprecated for a whole series, and then we can kill it in the next series)

@elenash
Copy link
Contributor Author

elenash commented Mar 25, 2015

@jsquyres Could you tell me which man page I should update? I'm a bit confused looking at so many files in ompi/mpi/man/man3/

@jsquyres
Copy link
Member

I believe it's orte/tools/orterun/orterun.1in (note the "in" suffix -- orterun.1 is generated from orterun.1in).

@elenash
Copy link
Contributor Author

elenash commented Mar 25, 2015

Ok, thanks, I will add it there. But I see that there is no information about -amca option, probably, there exist another man file.

@jsquyres
Copy link
Member

Well that's disappointing. I don't see it documented on any many page.

Oh well. Add docs for --tuned and we'll be good.

@rhc54
Copy link
Contributor

rhc54 commented Mar 25, 2015

It's definitely documented on the web site:

http://www.open-mpi.org/faq/?category=tuning#amca-param-files

@jsquyres
Copy link
Member

Ah, good. Knew it had to be documented somewhere.

We should probably update that FAQ page, too (maybe hint that it will be deprecated starting with v1.9, to be replaced with -tune, etc.)

@elenash
Copy link
Contributor Author

elenash commented Mar 25, 2015

There's some code for -amca option in orte/tools/orte-restart/orte-restart.c. I'm not sure I understand when it is used. Should I duplicate it for -tune?

@jsquyres
Copy link
Member

Yes, probably so. Anywhere that handles -amca should probably handle -tune. Can you put the deprecation notice there, too? Probably an opal_show_help() kind of message about the deprecation.

@elenash
Copy link
Contributor Author

elenash commented Mar 26, 2015

@jsquyres As far as I understand orte-restart must finally trigger the same flow as orte_init and my warning message in orte/mca/plm/base/plm_base_launch_support.c will be hit. Am I wrong?

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/382/
Test PASSed.

@elenash
Copy link
Contributor Author

elenash commented Mar 30, 2015

I updated man page and added a warning message for amca. Please, take a look.

@mike-dubman
Copy link
Member

@elenash - could you please update FAQ with -tune examples?

@elenash
Copy link
Contributor Author

elenash commented Mar 30, 2015

It looks like I don't have permissions to work with git@github.com:open-mpi/ompi-www.git

@jsquyres
Copy link
Member

@elenash You do now. :-)

@elenash
Copy link
Contributor Author

elenash commented Mar 31, 2015

Thanks!
I have one more comment. Currently to specify multiple files for -tune option I use comma delimiter (not colon as for -am option). Is it ok with you, guys?

@jsquyres
Copy link
Member

I think so.

We've been fairly consistent about using "," for lists and ":" for paths, right?

I.e., is the -am option a list or a path?

On Mar 31, 2015, at 6:15 AM, elenash notifications@github.com wrote:

Thanks!
I have one more comment. Currently to specify multiple files for -tune option I use comma delimiter (not colon as for -am option). Is it ok with you, guys?


Reply to this email directly or view it on GitHub.

Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

@elenash
Copy link
Contributor Author

elenash commented Mar 31, 2015

For -am option there is a list of paths specified. OPAL_ENV_SEP is used to split them which is a colon. That's why I ask you J

@mike-dubman
Copy link
Member

all set, FAQ will be following as well.

mike-dubman added a commit that referenced this pull request Apr 1, 2015
Introduce -tune command line option to set env vars and mca params from ...
@mike-dubman mike-dubman merged commit 58d0020 into open-mpi:master Apr 1, 2015
@jsquyres
Copy link
Member

jsquyres commented Apr 1, 2015

Thanks! In the FAQ, @elenash please be sure to mention that this is for v1.9 and beyond.

@elenash
Copy link
Contributor Author

elenash commented Apr 8, 2015

Updated FAQ:
open-mpi/ompi-www@dee79b8

jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
CSCuv67889: usnic: fix an error corner case
@jssfy
Copy link

jssfy commented Jan 7, 2021

When I was trying to pass environment variables via mca_base_envar_file_prefix, the remote process gave the following warning message (Process 4732 Unable to locate the variable file ....), but the variable "name" seemed to have been passed correctly.
The question is, is this warning a false warning or due to my wrong use of mca_base_envar_file_prefix?

Below are the background info:
sh-4.2# /usr/local/openmpi-2.1.6-cuda9.0/bin/mpirun --allow-run-as-root -H 172.20.57.107 --npernode 1 --np 1 -mca mca_base_envar_file_prefix env.file --tag-output bash -c 'echo $name $$'

Process 4732 Unable to locate the variable file "/opt/yeanhua/env.file" in the following search path:
/opt/yeanhua:/usr/local/openmpi-2.1.6-cuda9.0/share/openmpi/amca-param-sets:/opt/yeanhua

[1,0]:yeanhua 4736

sh-4.2# cat env.file
-x name=yeanhua

sh-4.2# mpirun -V
mpirun (Open MPI) 2.1.6

PS:
local ip: 172.20.57.106
remote ip: 172.20.57.107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants